AutoML: a High-level Interface for Developers
AutoML: a High-level Interface for Developers
- ML is an exponential trend in society and technology, powered by other trends
- This is a return on cross-generational investments
- ML is making people to reconsider what is software and how it is created
- Don’t need to be a data scientist to make useful predictive models
- AutoML is an efficient, flexible interface to custom ML
- There are no universal algorithms, but some are very useful
- Competition for AutoML marketing is heating up:
- GCP has a simple, practical interface
- AutoML is no silver bullet
- AutoML will be an important factor in supporting the exponential spread of ML across society
ML is an exponential trend in society and technology, powered by other trends
Data and compute and math are historical bottlenecks to wider adoption and larger impact
This is a return on cross-generational investments
Blaise Pascal:
Historically, most applications in science, some in industry:
- Radius of the earth
- Orbits of planets
- Census statistics
- Gambling
- Insurance
- Navigation
- Agriculture
- Assembly lines
ML is making people to reconsider what is software and how it is created
Software 1.0: (deterministic)
- inherently deterministic
- errors unacceptable
- little data, or hard to collect data
- ability to guess the complete algorithm and iterate on the design with some metric in mind
- guessed algorithm is underperforming and practically infeasible
Software 2.0: (probabilistic)
- inherently probabilistic
- errors are acceptable
- easy to collect, or large amounts are available
- inability to imagine the steps or
- high performance requires exponentially increasing number of steps covering a variety of corner cases
Don’t need to be a data scientist to make useful predictive models
Interfaces to ML:
- APIs with canned models (especially text, images, sounds)
- AutoML (google, h2o)
- Robust open-source libraries (xgboost)
- Modeling and inference frameworks (tensorflow)
AutoML is an efficient, flexible interface to custom ML
AutoML is intended to:
- Open ML to non-experts
- Make sure a decent model is built and deployed
There are no universal algorithms, but some are very useful
- Tabular data dominates in terms of usefulness and sheer variety
- Regression/classification models take advantage of it
| Application | Input | Output |
|---|---|---|
| Spam filtering | Text message | Pass/block |
| Online advertising | Ad, user info | Click/Skip |
| Email routing | Email text | Support line |
| Wait time | Queue features | Minutes to wait |
| Employee scheduling | Time, store, role | Number of employees |
| Cybersecurity | Computer behavior | Compromised/normal |
Competition for AutoML marketing is heating up:
- AWS sagemaker
- Google AutoML
- Microsoft ML studio
- DataRobot
- H2O
- Aible
GCP has a simple, practical interface
Examples:
- Text message spam probabilities and classification
- Real-state sales price prediction
AutoML is no silver bullet
Positives:
- workflow
- data splits
- good model implementations
- infra management
- path to deployment and value
Negatives:
- model limitations
- irrelevant differences, compute waste
- if it is broken, hard to debug and iterate in the environment